Reliable protein folding on non-funneled energy landscapes: the free energy reaction 

path 
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A theoretical framework is developed to study the dynamics of protein folding. The key insight 
is that the search for the native protein conformation is influenced by the rate r at which external 
parameters, such as temperature, chemical denaturant or pH, are adjusted to induce folding. A 
theory based on this insight predicts that (1) proteins with non-funneled energy landscapes can fold 
reliably to their native state, (2) reliable folding can occur as an equilibrium or out-of-equilibrium 
process, and (3) reliable folding only occurs when the rate r is below a limiting value, which can 
be calculated from measurements of the free energy. We test these predictions against numerical 
simulations of model proteins with a single energy scale. 



Under appropriate conditions, proteins spontaneously 
fold from an extended one-dimensional chain of amino 
acids to a unique three-dimensional native conformation. 
How this occurs on timescales accessible to experiment — 
and relevant to biological function — is a question that has 
intrigued scientists for the past forty years. Levinthal [1] 
was the first to recognize the importance of timescales 
and point out that, assuming a random search of con- 
formation space, proteins would not fold in a person's 
lifetime. This argument has come to be known as 
Levinthal's Paradox since proteins must fold for human 
life to exist in the first place. 

Of course conformation space is not sampled randomly 
and Levinthal's paradox has been resolved by apply- 
ing statistical mechanics to the protein folding prob- 
lem [H, [H, 0] • Each protein conformation has a free energy 
that determines its probability to be sampled at temper- 
ature T . While the free energy F generally comprises a 
sum of many enthalpic and entropic terms, it is conve- 
nient to express it as F = E — TS con {, where S'conf is 
the conformational entropy of only the protein degrees 
of freedom and E is the "internal energy" that includes 
all other contributions to the free energy (from both pro- 
tein and solvent). The functional dependence of E on 
all protein degrees of freedom is called the energy land- 
scape [H, 0| , which in general contains many minima. At 
T = only the energy landscape is relevant and the pro- 
tein resides in a local (or global) minimum, corresponding 
to a compact conformation. As T increases the confor- 
mational entropy smooths out the minima in the energy 
landscape and the protein adopts more extended states 
with larger S'conf- In the "new view" of protein fold- 
ing [H, 0] statistical fluctuations on an energy landscape 
give rise to an ensemble of folding pathways. 

Often associated with the new view is the hypothe- 
sis that energy landscapes have the shape of a multi- 
dimensional funnel 0, Q ■ Proponents argue that in or- 
der to fold reliably (transition to the native state with 
probability one) the energy landscape must contain a 
single low-lying minimum to which all conformations are 



channeled. If multiple funnels exist, separated by large 
enough energy barriers, then at low temperature or de- 
naturant concentration a protein can become trapped in 
a local minimum of energy that does not correspond to its 
native conformation. While the existence of a single fun- 
nel is a sufficient condition for reliable protein folding, the 
number of proteins with a single funnel is expected to be 
small and the observation of kinetic traps :9y and glassy 
behavior [Iol | in biologically relevant proteins indicates 
that not all proteins fold on smooth funneled landscapes. 

Here we address the open question: is a funneled en- 
ergy landscape necessary for reliable folding? By formu- 
lating a statistical theory that includes the dynamics of 
folding, we find that a funneled landscape is not neces- 
sary for reliable folding. The important insight is that the 
rate r at which temperature or chemical denaturant con- 
centration is decreased to induce folding affects the final 
conformation of the protein. For sufficiently small r the 
protein always folds to its native conformation, whereas 
for larger r it can become trapped in a metastable state. 
This leads to new predictions that can be tested in experi- 
ments and simulations. First, proteins with non-funneled 
energy landscapes can fold reliably to their native state 
if the rate r is below a limiting value. Second, reliable 
folding can occur as an equilibrium-quasistatic or non- 
equilibrium process. Third, in a non-equilibrium folding 
process, a protein can reliably fold to a local (instead 
of global) minimum of the energy landscape. We con- 
duct off-lattice simulations of model proteins with non- 
funneled energy landscapes and verify these predictions. 



RESULTS 

We consider proteins with general energy landscapes — 
not necessarily funneled — and derive the conditions un- 
der which folding occurs reliably. Generally, energy land- 
scapes contain multiple minima, possibly separated by 
large energy barriers. Thus folding is not necessarily an 
equilibrium process and misfolds can occur. Below we 
consider the dynamics of the folding process and its ef- 
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feet on reliable folding. 



A kinetic mechanism for folding 

Multiple minima in the energy landscape lead to mul- 
tiple minima in the free energy. In this case we argue 
that there is a basic kinetic mechanism that determines 
whether folding is reliable. We illustrate this kinetic 
mechanism by considering a transition from state A to 
state B on a non-funneled energy landscape. Although 
we will assume that the transition is driven by a reduc- 
tion of temperature, the same arguments can be applied 
when a change of denaturant concentration or another 
parameter induces folding. 

In Fig. [T] schematic illustrations of the free energy are 
plotted at four temperatures T\ > T 2 > T 3 > T 4 . We 
will assume that a transition from A — > B is induced 
by decreasing the temperature at a constant rate r such 
that T(t) = Xi(l — rt) as a function of time t. Initially 
at T\ the protein resides in state A. As temperature is 
reduced to T 2 an equilibrium transition to state B can 
occur with folding time proportional to exp(AF /T 2 ) jr* , 
where r* is the rate at which conformations are explored. 
At T3 a third state M has free energy equal to that of 
A. As temperature is further reduced to T4, the mini- 
mum corresponding to state A no longer exists and the 
activation barrier AF' grows. 

Dynamics are important in determining transitions be- 
tween states A and B. If the time that it takes for the 
temperature to decrease from T2 to T3 is less than the 
folding time, the protein can fall into the metastable state 
M. This sets a bound on r: if 
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r > r' = 



{T 2 -T 3 )r* —AF 
-exp(^— ) 
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(1) 



then the protein is likely to populate the state M. Note 
that we use units where Boltzmann's constant fc# = 1. 

For a misfold to occur, the escape probability from the 
metastable state must be sufficiently small. If the protein 
populates state M at time £3, the probability that it has 
escaped at time t is given by 

P(t-t 3 ) = exp^-^" dtr* exp(-AF'(T)/T) 

= exp {-g(t - t 3 )) . (2) 
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FIG. 1: Schematic plots of the free energy versus an arbi- 
trary reaction coordinate at four temperatures where T\ > 
Ti > T 3 > Ta. At Ti only the state A is accessible. At T 2 , 
transitions to state B occur with activation barrier AF. T3 
is defined as the largest temperature at which a new state M 
exists with free energy equal to that of state A. If the protein 
has not transitioned to state B by T3, misfolds can occur. 
At T4 the free energy barrier AF'(T) separating M and B 
becomes larger than it was at T3. 



the probability to become trapped in the metastable state 
M is significant and misfolds occur 21 1. 

From these basic considerations it is apparent that 
protein folding transitions on non-funneled energy land- 
scapes are influenced by multiple minima in the free en- 
ergy and the rate r at which external parameters are 
varied to induce folding. To determine whether reliable 
folding occurs we must address two important questions: 
(i) can the protein conformation reside in a metastable 
local minimum? and (ii) is it likely that the protein con- 
formation becomes trapped in that local minimum? The 
answers to these questions define the limiting rates 
and r s . The transition A — > B occurs reliably if r obeys 
one of the inequalities, r < r* or r < r s . In the case 
that r < r s the protein is given sufficient time to sample 
all states and the transition A — > B occurs reliably as 
an equilibrium process. If r s < r < r' the protein con- 
formation becomes trapped in the state B without fully 
exploring phase space and the transition occurs reliably, 
but out of equilibrium. If r > r* and r > r 3 then the 
protein does not transition between A and B reliably. 



For a maximum waiting time r the protein always escapes 
the metastable state for g(r) 3> 1 and rarely escapes for 
g(r) <C 1. The crossover between frequently escaping 
from and being trapped in state M occurs when <?(t) « 1. 
Using T(t) = 2~i(l — rt) we find that when the rate 



r > r 



. — AF'(T), dT 
cxp( jM)jr, (3) 



The Free Energy Reaction Path 

In the previous section we identified a kinetic mech- 
anism that influences transitions on non-funneled land- 
scapes. In this section we use this mechanism to formu- 
late a general framework for understanding folding. We 
begin by partitioning the energy landscape into basins 
associated with particular protein topologies, proceed to 
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define the free energy reaction path that describes how 
the protein transitions from one topology to another, and 
then use the kinetic mechanism described above to de- 
termine whether folding is reliable. 

As a way to understand complex folding dynamics, the 
energy landscape of an arbitrary protein can be parti- 
tioned into basins surrounding each local minimum, anal- 
ogous to the inherent structure formalism for liquids and 
glasses ll|. In particular, the infinite number of pro- 
tein conformations can be uniquely associated with a 
finite number of topologies, defined as protein confor- 
mations that correspond to local minima of the internal 
energy. We denote a topology as t™, where n is an in- 
dex that contains sufficient information to fully describe 
the conformation (e.g. number, type and arrangement of 
bonds). The set of conformations B(t n ) associated with 
each topology t™ is the basin of attraction for that topol- 
ogy. The basin of attraction is defined such that all con- 
formations that belong to B(t n ) relax to the topology t" 
when thermal fluctuations of the protein are suppressed. 
Thus the infinite number of possible protein conforma- 
tions is represented by a finite number of topologies and 
a free energy F(t n ) can be defined for the set of pro- 
tein conformations B(t n ). Formally the partition func- 
tion Z(t n ) for conformations constrained to lie in B(t n ) 
is given by 



Z(t n ) 



exp(-E/T) dT, 



(4) 



B(t" 



where integration is over all coordinates T in the basin 
B(t n ) and E is the internal energy as a function of T. 
The free energy for a protein constrained to B(t n ) can 
then be written in terms of the topology t" as 



F(t",T) = E(t n ,T) - TS con{ (t n ,T), 



(5) 



where E(t n , T) is the internal energy of topology t™ and 
Sconf(t" ,T) is its associated entropy [ll|, given by 



?conf 



(t n ,T) 



log/ exp(-[E-E(t n ,T)]/T) dT. 
•/B(t») 



(6) 

The random coil state t° with zero internal energy has 
the largest entropy and is therefore the global minimum 
of free energy at sufficiently large temperature. 

Given a protein with an energy landscape that has 
been partitioned into basins of attraction, we define the 
free energy reaction path as the ordered sequence of 
topologies that the protein adopts as temperature is re- 
duced in the equilibrium limit. That is, if the rate 
r is sufficiently small, the protein will come to equi- 
librium at all temperatures and proceed through the 
basins of attraction for a reproducible set of topologies 

t° -> t" 1 ->• t" 2 > t nN . Each transition occurs at 

the temperature where the free energy of two topologies 
is equal, e.g. the transition t° — > t" 1 occurs at the tem- 
perature T* where F(t°,T*) = F(t ni ,T*). In this way, 



for any energy landscape, the free energy reaction path 
encodes the path taken through conformation space when 
folding occurs as an equilibrium-quasistatic process. 

To determine whether folding is reliable, we apply the 
analysis introduced in the previous section to each tran- 
sition in the free energy reaction path. If we label the 
transitions by i = 1, 2, . . . , N then limiting rates r- and 
r| can be determined for each transition by measuring 
properties of the free energy. There are then three dis- 
tinct folding scenarios: (1) if r < rf for all i then the 
protein does not become trapped in metastable confor- 
mations and folding occurs reliably in equilibrium; (2) if 
rf < r < rf for a single transition i then the protein falls 
out of equilibrium at transition i, but reliably folds to 
the topology t ni (since the condition r < r{ guarantees 
that the protein does not fall into a different metastable 
state). Note that if there exist multiple transitions with 
rf < r < r{ then the protein will reliably fold to the 
topology with the smallest value of rij for which this con- 
dition holds. Finally, (3) if r > rf and r > r{ for any i, 
and condition (2) does not hold for a smaller value of i, 
then the protein will not fold reliably. 

From our analysis we deduce that there are two types of 
reliable folding, equilibrium and non-equilibrium. While 
reliable equilibrium folding brings the protein to the 
global minimum of free energy, reliable non-equilibrium 
folding can target local minima. The free energy reaction 
path provides a useful framework to classify the relevant 
transitions since, depending on the rate r, a protein will 
either (1) pass through all topologies on the free energy 
reaction path and arrive at the topology with the small- 
est free energy, (2) target an intermediate topology along 
the free energy reaction path and reliably fold to a local 
minimum of free energy, or (3) misfold and deviate from 
the free energy reaction path. 



Simulations of a model protein 

To test the predictions of the previous section we 
perform off-lattice Brownian dynamics simulations of a 
model protein with a single attractive energy scale. We 
model the protein as a polymer chain containing both 
attractive (green) and non-attractive (white) spherical 
monomers of size a. Interactions between non-adjacent 
green monomers are attractive with energy depth E c < 0, 
while interactions between non-adjacent pairs of green- 
white or white-white monomers are purely repulsive. 
This model is a variant of the "HP" model [l3| . Thermal 
fluctuations of the protein at temperature T are included 
using Brownian dynamics simulations with solvent vis- 
cosity rj. We observe that as the parameter c = \E C \/T 
increases from zero the polymer chain transitions from a 
random coil to a folded conformation. To test the pre- 
dictions of the theory we simulate a specific sequence of 
green and white monomers, pictured in Fig. [21 In this 
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FIG. 2: Contour plot of the energy landscape and pictures 
of the relevant topologies for a model protein. The fully ex- 
tended conformation is shown at the top of the figure. The 
inset displays the full energy landscape and the main figure 
contains a magnified view of the compact states. The land- 
scape is plotted as a function of the radius of gyration R g 
and end-to-end distance D, each normalized by the monomer 
diameter. The colorbar gives the total internal energy of the 
protein divided by the attraction strength \E C \. There are 
three distinct energy minima separated by barriers and the 
associated topologies are pictured. White regions correspond 
to protein conformations that are never sampled in the simu- 
lations. 



article we present results for two dimensions in order to 
simplify identification of the multiple topologies that the 
polymer chain adopts. We have also conducted simula- 
tions in three dimensions and these results are included 
in the supporting information. 

In Fig. [2] we plot the energy landscape of the poly- 
mer chain as a function of two reaction coordinates: the 
radius of gyration R g and the end-to-end distance D, 
each normalized by the monomer diameter a. In terms 
of these two reaction coordinates, three energy minima 
exist and are separated by energy barriers. The minima 
correspond to three distinct topologies that are pictured 
in Fig. [SJ We find a total of four relevant topologies 
for this simple system, containing either zero t , three 
t 3 , four t 4 , or five t 5 bonds between attractive green 
monomers. Energy barriers exist between t 3 , t 4 and t 5 
because, in order to transition between the topologies, it 
is necessary to first break a bond and then rearrange the 
chain conformation. Note that four green particles is the 



minimum number needed to ensure multiple energy min- 
ima in two dimensions, while seven are required in three 
dimensions. Including additional green particles intro- 
duces additional minima and more complicated energy 
landscapes — we treat only the simplest case here. 

Given the non-funneled energy landscape of the sim- 
ulated protein we now determine the associated free en- 
ergy reaction path. Measurements of free energy F/T, 
normalized by temperature, as a function of E/\E C \ and 
end-to-end distance D are shown in Fig. [3] for a sequence 
of c- values that corresponds to the sequence of schematic 
plots in Fig. [TJ In Fig. [3]Ja) we plot F/T for a small 
value of c = 0.0040 and observe that the random coil 
state t° is the only free energy minimum. In Fig. E^b) c 
is increased to c-i = 0.0085 and there are multiple local 
minima in the free energy, including the topologies t , 
t , t 3 , and t 5 . The free energies of t° and t 5 are equal 
in Fig. |3fb). At a slightly higher value c = c 3 = 0.0100, 
Fig. [3lc) exhibits three minima and the free energy of t° 
and t 3 are equal. Finally at c = 0.0145, the free energy 
plotted in Fig.[3^d) exhibits a deep minimum at topology 
t 5 . 

From the plots in Fig. [3] we conclude that the first 
and only transition in the free energy reaction path is 
t° — > t 5 where the protein folds to its native confor- 
mation. Although other local minima exist in the free 
energy and misfolds are possible for c > C3, F(t 5 ) is the 
global minimum of free energy for c > c^. This simple 
polymer chain does not exhibit any intermediate states 
on the free energy reaction path, which prevents us from 
testing whether proteins can fold reliably to metastable 
minima. However we will test all other predictions of 
the theory. In the Materials and Methods section we 
calculate the limiting rates r^rja 2 /T = 1.8 x 10~ 7 and 
r s r]a 2 /T = 3.0 x 10 -8 for the single transition on the 
free energy reaction path, where rja 2 /T is the simulation 
time-unit. 

Now that we have determined the free energy reaction 
path and calculated the limiting rates, we conduct dy- 
namic simulations of folding. To induce folding in the 
polymer chain c is increased linearly in time at rate r 
(c = rt), starting from the topology t° at c = 0. In 
Fig. [UJa) the energy of the polymer chain is plotted as a 
function of c for three different values of r, with the final 
state labeled by its topology. From this figure we clearly 
see that small r targets the native state t 5 whereas larger 
r leads to misfolding. In Fig. HJb) we plot the probabil- 
ity to fold to the native state t 5 as a function of rr]<r 2 /T, 
averaged over many folding trajectories studied for each 
r. The protein folds reliably for small rates. 

The modern theory of protein foldingrequires funneled 
energy landscapes for reliable folding [J, |8(] . The simple 
protein model we consider here provides a contradiction 
to this viewpoint since it does not possess a funneled 
landscape but nevertheless folds reliably at small r. The 
free energy reaction path theory predicts that reliable 
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FIG. 3: Contour plots of the free energy F/T normalized by temperature as a function of E/\E C \ (horizontal axis) and end-to- 
end distance D (vertical axis) for a sequence of c-values. The free energy is calculated from the probability for the protein to 
be in a conformation with given E/\E C \ and D. White regions correspond to protein conformations that are never sampled in 
the simulations. 



folding can occur on non-funneled landscapes and pro- 
vides a means to quantitatively determine the limiting 
rate below which folding is reliable. Given the values of 
and r s quoted above, the free energy reaction path 
theory predicts reliable folding for rrja 2 /T < 1.8 x 10~ 7 . 
In Fig. 0Jb) we have measured that reliable folding oc- 
curs for normalized rates less than « 10 -7 . The theory 
therefore makes a correct quantitative prediction of the 
simulation results. Additionally, the values of r* and 
r s indicate that there is a range of rates r s < r < r' 
where reliable folding to t J occurs out of equilibrium. We 
test this prediction by measuring energy fluctuations for 
rates at which folding is reliable, as plotted in Fig. BJc). 
For r < r s fluctuations are large at the transition point 
c = 0.0085 since the protein is sampling both folded and 
unfolded conformations as it remains in equilibrium. For 
r > r s fluctuations remain small near the transition point 
since the protein becomes trapped in the folded state and 
reliable folding is a non-equilibrium process. 



DISCUSSION 

Levinthal was the first to realize that the exponen- 
tial number of collapsed conformations preclude a pro- 
tein from finding its native state via random sampling. 
The experimental observation that proteins fold reliably 
to a reproducible native state therefore requires an expla- 
nation. The modern view is that protein sequences have 
evolved to favor energy landscapes with a single funnel 
and can therefore fold reliably. We have demonstrated 
that proteins with non-funneled energy landscapes can 
also fold reliably, as long as the external parameters that 
induce folding are adjusted slowly enough. 

We have identified two reliable folding processes 
on non-funneled landscapes: equilibrium and non- 
equilibrium. Even though it is possible that in experi- 



mental and biological settings the rate at which external 
parameters are varied to induce folding is too large to ac- 
cess the equilibrium limit, reliable folding can occur out 
of equilibrium. If this is the case, the native state should 
be regarded as a reliably targeted local minimum on the 
free energy reaction path that remains metastable over 
timescales sufficient for biological function. 

The importance of the free energy reaction path and 
the necessity of using small rates to vary external param- 
eters presents challenges for protein folding simulations. 
Reliable protein folding is especially difficult to study in 
all-atom simulations where, due to the long time scales 
and large number of atoms, extremely rapid rates are 
used to induce folding [l4|. From our results, reliable 
folding on non-funneled landscapes depends on rate, thus 
simulation studies that argue that funneled energy land- 
scapes are necessary for reliable folding [IH must be care- 
fully interpreted if only large rates are considered. 

Our predictions can be tested in experiments by study- 
ing folding over a range of rates, using methods such as 
ultrafast mixing or laser pulsing [l6j |. Some progress has 
been made in this direction [l7( and the observation of 
"strange kinetics" [l8| after rapid temperature jumps is 
consistent with our predictions. In three dimensions the 
limiting rates are proportional to r* oc T/rjRjj, where 
Rh is the hydrodynamic radius. This implies that in- 
vestigating folding in a variety of solvents with different 
viscosities rj can greatly increase the range of experimen- 
tally accessible rates. Moreover, due to the inverse de- 
pendence on T, folding by changing temperature will give 
different limiting rates than folding by reducing denatu- 
rant concentration. 

Finally, it is intriguing to speculate about folding in 
vivo. Given that the folded state of a protein is depen- 
dent on rate at which external parameters are varied to 
induce folding, and that local minima in free energy can 
be targeted by adjusting this rate, it is possible that pro- 
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FIG. 4: (a) Folding trajectories from simulations with identi- 
cal initial conditions at three different rates. The normalized 
energy E/\E C \ is plotted as a function of c and the final state 
is labeled by its topology. Slow rates lead to the native state 
t 5 whereas fast rates lead to unreliable folding, (b) The prob- 
ability of folding to the native state P c as a function of rate r. 
Error bars are from sampling statistics. For rrja 2 /T < 1CP 7 
the protein folds reliably to the topology t 5 . Vertical lines 
indicate the values of and r s calculated in the text, (c) 
Energy fluctuations SE 2 = ((E 2 ) - (E) 2 )/E 2 as a function 
of c for folding simulations at different rates r. For r < r s 
(dashed lines) the fluctuation curves appear to collapse and 
reliable folding occurs in equilibrium. For r s < r < r* (full 
lines) fluctuations depend on r and reliable folding occurs out 
of equilibrium. Inset: Energy fluctuations at the equilibrium 
transition point c = C2 = 0.0085 as a function of r jr B . 



tein sequence has evolved along with the biological en- 
vironment in which it folds. Since the folding process is 
determined by protein sequence and rate, both are likely 
used in nature to ensure robust folding. 



MATERIALS AND METHODS 
Simulation protocol 

Simulations are performed on polymer chains of spher- 
ical monomers, each with diameter a. We include 
two types of monomers — attractive (green) and non- 
attractive (white). Interactions depend on the separa- 
tion ry between monomers i and j, and it is convenient 
to define the normalized distance fy = ry/c Interac- 
tions between adjacent monomers are chosen to prevent 
the polymer chain from breaking, while interactions be- 
tween non- adjacent monomers are either purely repul- 
sive (for green-white or white-white interactions) or at- 
tractive (for green-green interactions). More specifically, 
monomers that are adjacent on the polymer chain ex- 
perience a piecewise continuous potential 3> cc (r) that is 
comprised of a purely repulsive Lennard- Jones (RLJ) po- 
tential [3 for separations fy < 1 and a FENE poten- 



tial [2(| for separations 



> 1: 



e{f-™ - 2f~» + 1) 
-elog(l-<T 2 (fy-l) 2 ) 



< l 
> l 



(7) 



where e sets the energy scale and q = 0.1. This potential 
has a minimum of zero at fy = 1 and diverges at fy = 1+ 
q to prevent adjacent monomers from unbinding. Green- 
green interactions are described by a Lcnnard-Jones (LJ) 
potential 



att \' ij 



(8) 



with energy depth E c < at ry = 1, whereas green- white 
and white-white interactions obey a RLJ potential 







< 1 
> 1 



(9) 



that provides a repulsive force when particles overlap and 
no force when they do not overlap. 

Thermal fluctuations are included using off-lattice 
Brownian dynamics simulations . The vector position 
Xi of each monomer i is determined at each time-step by 
the attractive and repulsive forces arising from the poten- 
tials in Eqs. [7][9]and random forces arising from thermal 
fluctuations. The equation of motion for monomer i is 



d 2 t 



Fi(t)-T}Vi—^r^2 [{®cc{r l] ) + <5>att{n j )+<5>rep{r i j)\; 

(10) 



where Fi(t) is a Gaussian random force and —r\Vi a damp- 
ing force, with ui denoting the velocity of monomer i and 
X] the solvent viscosity. The Gaussian random force has 
zero mean and a standard deviation proportional to Tjr\. 
We solve Eg. 1101 using standard numerical integration 
techniques [19] in the limit that monomer mass mi — 0. 
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Folding simulations are conducted by starting with 
E c = and decreasing E c linearly in time with 
rate r at constant T = 1. In the supporting infor- 
mation we include two movies from our simulations. 
These show the behavior of a two dimensional polymer 
chain at rrja /T = 1CT 7 where folding occurs reliably 
( "slowrate.mov" ) and at rr/a 2 /T = 1CP 5 where a misfold 
occurs ("fastrate.mov"). 
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Calculating energy landscapes and free energy 

The energy landscape in Fig. [2] is created by run- 
ning 20 separate folding simulations at each of five rates 
rr/a 2 /T = 1(T 8 , 1CT 7 , 1CT 6 , 1CT 5 , andlCT 4 . Each sim- 
ulation explores the range < c < 0.4 and the energy 
landscape is obtained by constructing a histogram over 
all observed states. We believe that the landscape is suf- 
ficiently sampled since we observe that there is very little 
difference at small D and R g between the energy land- 
scape pictured in Fig. [2] and ones measured using only 
data from the smallest r. 

The free energies in Fig. [3] are measured by slowly 
ramping to the desired c-value with rija 2 /T = 5 x 10~ 9 , 
and then calculating a histogram of the probability 
P{E 1 D) to have energy E and end-to-end distance D 
over 10 8 time-steps for each c-value reported. The free 
energy F{E, D) is determined (within an additive con- 
stant) from the probability via the relation F(E, D) = 
-TlogP(E,D). 



FIG. 5: Average time to transition from t 3 to t 5 as a function 
of c. 

The rate r s is determined by preparing the protein 
in topology t 3 and measuring the average time t s (c) re- 
quired to transition to the native topology t 5 . We aver- 
age t s (c) over 100 trials for each c-value and it is plotted 
in Fig. EI Since t a (c) = exp(AF'(c)/T)/r* we calculate 
r s rja 2 /T = 3.0 x 10~ 8 by direct integration of t a (c) -1 , ac- 
cording to Eq. [T2J Contributions to the numerical value 
of r s from c > 0.02 are negligible. 
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Calculating r* and r s 

The limiting rates can be determined using equations 
similar to those in Eqs. [1] and 02 

t -AF 
r T = (c 3 -c 2 )r*exp(^^), (11) 

r°° —AF'(c) 
r s = / r*cxp( [C) )dc. (12) 

Jc 3 1 

These equations are derived for the simulation protocol 
where \E C \ = cT increases linearly in time to induce fold- 
ing, with T constant. The maximum waiting time is 
taken to infinity. 

We first calculate . The data in Fig. [3] gives C2 = 
0.0085 and c 3 = 0.01. The free energy barrier AF/T is 
determined by preparing the protein in topology t 5 at 
c = C2 and measuring the amount of time tf required 
to transition to topology t°, averaged over 100 trials. 
The free energy barrier is related to the transition time 
by t f = exp(AF/T)/r*. We measure tfT/rja 2 = 8400, 
where ija 2 /T is the fundamental unit of time in the 
simulations. Inserting these numbers into Eg. 1111 yields 
rfr)a 2 /T = 1.8 x 10" 7 . 
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SUPPORTING INFORMATION: SIMULATION 
RESULTS IN THREE DIMENSIONS 

In the manuscript, simulation results were presented 
for a two dimensional model protein. Here we include 
results for three dimensions. These results exhibit similar 
behavior and support the theoretical predictions. 

We perform off-lattice Brownian dynamics simulations 
in three dimensions to simulate the folding process. We 
study the model protein pictured in Fig. [6] that consists 
of 25 monomers, seven of which are attractive. In Fig. [5] 
we plot the protein energy landscape as a function of 
the radius of gyration R g and end-to-end distance D, 
each normalized by the monomer diameter a. There are 
two minima at small R g and D, corresponding to the 
topologies t 15 and t 16 pictured in the figure. 

As in the two dimensional case, non-funneled energy 
landscapes promote misfolding if the rate that the at- 
tractive strength \E C \ is increased to induce folding is 
sufficiently large. In Fig. [T^a) we plot the energy as a 
function of c = \E C \/T. For small rates the simulated 
protein folds to the global energy minimum t 16 . For 
larger rates the system misfolds to the local minimum 
t 15 . In Fig. [TJb) we plot the probability to fold to the 
native state t 16 as a function of rate. The protein folds 
reliably below a normalized rate of ~ 2 x 10 -6 . 

The limiting rate below which folding is reliable can be 
predicted by measurements of free energy. In Fig. [8] we 
plot the free energy as a function of end-to-end distance 
D and normalized energy E/\E C \ for many different val- 



ues of c. In Fig. EJa) the random coil state t° is the only 
minimum in the free energy. For c = 0.0067, Fig. [Hb) 
demonstrates that t 16 and t° have equal free energies. 
In Fig. [8[c) the random coil t°, native state t 16 , and 
metastable state t 15 basins of attraction are present. At 
this value of c = 0.0072, topology t 15 has a free energy 
equal to that of t°. For larger c Fig. [Sfd) demonstrates 
that the protein has an increasing probability to popu- 
late the basin of attraction for t 16 , although the basin of 
attraction for t 15 is still visible. From this series of free 
energy plots, it is apparent that the simulated protein 
possesses a single equilibrium transition at c = ci from 
t° to t 16 , and misfolds to t 15 are possible for c > C3. 

The rate r' is calculated using the values C2 = 0.0067 
and C3 = 0.0072, along with the transition time tf from 
t 16 to t° at c = 0.0067. We measure tfT/rja 3 = 1850, 
averaged over one hundred trials. Given these values we 
calculate r f r/a 3 /T = 2.7 x 10~ 7 . 

The rate r s is calculated by measuring the transition 
time t s (c) between topologies t 16 and t 15 , which is shown 
in Fig. [9l Directly integrating this data for c > C3 yields 
r s r]a 3 /T = 2.3 x 10~ 6 . 

Given the values of r' and r s we expect the protein to 
fold reliably for rr\a 3 jT < 2.3 x 10~ 6 , which is consis- 
tent with the data in Fig. [7^b). In contrast to the two 
dimensional simulations, we find < r s and thus this 
particular protein can only fold in equilibrium. Generally 
we believe that the ordering of and r s can depend on 
the length, sequence and energy scales of the protein. 
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FIG. 6: Energy landscape and relevant topologies for a three dimensional model protein, pictured in an extended state with 
no bonds at the top of the figure. The inset is the full energy landscape, and the main figure contains a magnified view of 
the compact states. The colorbar gives the total energy of the system normalized by the magnitude of the attraction strength 
\E C \. There are two distinct energy minima separated by barriers and the topologies of each minima are pictured and labeled. 
White regions correspond to protein conformations that are never sampled in the simulations. 
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FIG. 7: (a) Folding trajectories in simulations with identical initial conditions at four different rates. The normalized energy 
E/\E C \ is plotted as a function of c and the final state is labeled by its topology. Slow rates find the native state t 16 reliably 
whereas fast rates give rise to unreliable folding, (b) The probability P c of folding to the native state t 16 as a function of rate 
r. Error bars are from sampling statistics. For rr/a 3 /T < 2 x 10~ 6 the system folds reliably. Vertical lines indicate the values 
of and r a calculated in the text. 




FIG. 8: Contour plots of the free energy F/T normalized by the temperature, as a function of the normalized energy E/\E C \ 
(horizontal axis) and end-to-end distance D (vertical axis) for four values of c. White regions correspond to protein conforma- 
tions that are never sampled in the simulations. 
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FIG. 9: Average time to transition between t J and t as a function of < 



